Chinese Name Disambiguation Based on Adaptive Clustering with the Attribute Features

نویسندگان

  • Wei Tian
  • Xiao Pan
  • Zhengtao Yu
  • Yantuan Xian
  • Xiuzhen Yang
چکیده

To aim at the evaluation task of CLP2012 named entity recognition and disambiguation in Chinese, a Chinese name disambiguation method based on adaptive clustering with the attribute features is proposed. Firstly, 12-dimensional character attribute features is defined, and tagged attribute feature corpus are used to train to obtain the recognition model of attribute features by Conditional Random Fields algorithm, in order to do the attribute recognition of given texts and knowledge bases. Secondly, the training samples are tagged by utilizing the correspondences of the text attribute and answer, and attribute feature weight model is trained based on the maximum entropy model and the weights are acquired. Finally, the fuzzy clustering matrix is achieved by the correlation of Knowledge Base(KB) ID attributes and text attributes for each KB ID, the clustering threshold is selected adaptively based on the F statistic, and clustering texts corresponding to ID are obtained, thus the texts corresponding to each ID are gained followed. For the texts not belong to KB, Out and Other types are obtained by fuzzy clustering to realize name disambiguation. The evaluation result is: P = 0.7424, R = 0.7428, F = 0.7426.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combine Person Name and Person Identity Recognition and Document Clustering for Chinese Person Name Disambiguation

This paper presents the HITSZ_CITYU system in the CIPS-SIGHAN bakeoff 2010 Task 3, Chinese person name disambiguation. This system incorporates person name string recognition, person identity string recognition and an agglomerative hierarchical clustering for grouping the documents to each identical person. Firstly, for the given name index string, three segmentors are applied to segment the se...

متن کامل

Adaptive Resonance Theory Based Two-Stage Chinese Name Disambiguation

It’s common that different individuals share the same name, which makes it time-consuming to search information of a particular individual on the web. Name disambiguation study is necessary to help users find the person of interest more readily. In this paper, we propose an Adaptive Resonance Theory (ART) based two-stage strategy for this problem. We get a first-stage clustering result with ART...

متن کامل

A Novel Method of Text Clustering for Chinese Spam Based on Semantic Body

The effect of spam filtering method based on statistics is not good in filtering the new-type spam with synonymous substitution and camouflage. So a new text clustering method based on Semantic Body for filtering Chinese spam is proposed. In this paper, the word sense disambiguation, lexical chain based on HowNet and statistic-based TFIDF are adopted to extract features of mails. The Semantic B...

متن کامل

Jumping Distance based Chinese Person Name Disambiguation

In this paper, we describe a Chinese person name disambiguation system for news articles and report the results obtained on the data set of the CLP 2010 Bakeoff-3. The main task of the Bakeoff is to identify different persons from the news stories that contain the same person-name string. Compared to the traditional methods, two additional features are used in our system: 1) n-grams co-occurred...

متن کامل

DLUT: Chinese Personal Name Disambiguation with Rich Features

In this paper we describe a person clustering system for a given document set and report the results we have obtained on the test set of Chinese personal name (CPN) disambiguation task of CIPSSIGHAN 2010. This task consists of clustering a set of Xinhua news documents that mention an ambiguous CPN according to named entity in reality. Several features including named entities (NE) and common no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012